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REMARKS 

In view of the following discussion, the Applicants submit that none of the claims 
now pending in the application is anticipated under the provisions of 35 U.S.C. §102. 
Thus, the Applicants believe that all of these claims are now in allowable form. 

I. REJECTION OF CLAIMS 1, 7-11 AND 17-21 UNDER 35 U.S.C, S 102 

Claims 1,7-11 and 17-21 stand rejected as being anticipated by the Nitta et al. 
patent (U.S. 4,881 ,266, hereinafter "Nitta"). In response, the Applicants have amended 
independent claims 1,11 and 21, from which claims 2-3, 7-10, 12-13 and 17-20 
depend, in order to more clearly recite aspects of the present invention. 

The Applicants respectfully direct the Examiner's attention to the fact that Nitta 
fails to disclose or suggest the novel invention of producing and providing an endpoint 
signal corresponding to the occurrence of at least one speech endpoint to a speech 
recognition application , along with a speech signal associated with the endpoint signal, 
for subsequent recognition of the associated speech signal, as claimed in Applicants' 
amended independent claims 1,11 and 21. 

In contrast, Nitta teaches providing a pattern match detector with a plurality of 
word frame boundary intervals representing potential intervals of speech in an audio 
signal. Thus, Nitta fails to anticipate Applicants' invention. 

Specifically, Nitta teaches a speech recognition system that identifies potential 
endpoints in an input speech signal based on calculated sound power (/.e M the total 
sound energy emitted by a source per unit time) at various points in the speech signal. 
In particular, potential endpoints are identified at points in the speech signal where the 
sound power exceeds (e.g., a starting point) or falls below (e.g., an ending point) a 
given threshold for a certain duration of time. Portions of the speech signal bounded by 
starting and ending points are identified as word "intervals", and feature parameters 
from these word intervals are sampled. The sampled feature parameters are compared 
to sampled feature points extracted from melcepstral coefficients corresponding to the 
speech signal, and metrics of this comparison are provided to a discriminator for sorting 
and output as a recognition result. 



7 

PACE 10/13 • RCVD AT 9/18/2008 4:37:51 PM [Eastern Daylight TimeJ * SVR:USPTO-EFXRF<5/9 • DNIB:2738300 * CSID:732 530 9808 " DURATION (mm-ss):02-40 



09/18/2006 16:39 FAX 732 530 9808 



PATTERSON & SHERIDAN - PTO 



©011/013 



09/829,831 

Thus, Nitta teaches a method that, at best, provides similarity measurements 
relating to pre-seamented portions of a speech signal to a speech recognizer, which 
produces a recognition result by sorting these measurements. This is not the same as 
providing the speech recognizer with an endooint signal (e.g., a binary or continuously 
generated signal) corresponding to the occurrence of at least one endpoint in a speech 
signal , e.g., in order to facilitate subsequent signal segmentation and processing by a 
speech recognition application. Nitta thus fails to anticipate a method for processing an 
input speech signal wherein a speech endpoint signal is produced and provided, along 
with the input speech signal, to a speech recognition application for recognition of the 
input speech signal, as positively claimed by the Applicants in claims 1,11 and 21. 

Specifically, Applicants' claims 1,11 and 21 positively recite: 



1 . A method for processing a speech signal comprising: 

extracting prosodic features from a speech signal; 

modeling the prosodic features to identify at least one speech endpoint; 

producing an endpoint signal corresponding to the occurrence of the at least one 
speech endpoint : and 

providing the endpoint signal and the speech signal to a speech recognition 
application to facilitate subsequent recognition of the speech signal. (Emphasis added) 



1 1 . Apparatus for processing a speech signal comprising: 

a prosodic feature extractor for extracting prosodic features from the speech 

signal; 

a prosodic feature analyzer for modeling the prosodic features to identify at least 
one speech endpoint; 

an endpoint signal producer that produces an endpoint signal , corresponding to 
the occurrence of the at least one speech endpoint : and 

means for providing the endpoint signal and the speech signal to a speech 
recognition application to facilitate subsequent recognition of the speech signal. 
(Emphasis added) 

21. An electronic storage medium for storing a program that, when executed by a 
processor, causes a system to perform a method for processing a speech signal 
comprising: 

extracting prosodic features from a speech signal; 
modeling the prosodic features to identify at least one speech endpoint; 
producing an endpoint signal corresponding to the occurrence of the at least one 
speech endpoint ; and 

providing the endpoint signal and the speech signal to a speech recognition 



8 

PAGE 11/13 ■ RCVD AT 9/18/2008 4:37:51 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF-5/9 * DNIS: 2738300 * C SID: 732 530 9808 " DURATION (mm-ss):02-40 



09/18/2006 16:39 FAX 732 530 9808 



PATTERSON & SHERIDAN - PTO 



121012/013 



09/829,831 

application to facilitate subsequent recognition of the speech signal. (Emphasis Added) 

In one embodiment, the Applicants' invention is directed to a method for applying 
prosody-based ehdpointing to a speech signal. Conventional speech processing 
techniques that are used to provide signals, based on spoken words or commands 
(e.g., for controlling devices or software programs), typically are characterized by an 
inability or difficulty in locating suitable speech segments within the spoken input for 
processing. Typical endpointing techniques identify the completion of a speech 
segment or utterance by measuring pauses in the given speech signal. However, since 
spoken language is not typically produced with such explicit indicators, typical 
endpointing techniques may misinterpret normal fluctuations in the rhythm of speech, 
such as mid-sentence pauses, to indicate the completion of an utterance. The resultant 
translation of a spoken command may therefore be fraught with inaccuracies. 

The Applicants' invention facilitates the translation of spoken input by extracting 
and modeling the prosodic features of an input speech signal in order to identify at least 
one endpoint in the input speech signal. Output is produced in the form of an endpoint 
signal that represents the occurrence of the identified endpoint in the input speech 
signal. Both the input speech signal and the generated endpoint signal are then 
provided to a separate speech recognition application that uses the endpoint signal to 
facilitate segmentation and subsequent word recognition of the input speech signal. 
The resultant translated speech thus more accurately reflects the spoken input. 

As discussed above, Nitta does not produce or provide an endpoint signal to a 
speech recognition application, but rather provides a speech recognizer with a plurality 
of similarity measurements relating to sampled segments of an input speech signal. 
Therefore, the Applicants submit that independent claims 1,11 and 21 fully satisfy the 
requirements of 35 U.S.C. §102 and are patentable thereunder. 

Dependent claims 7-10 and 17-20 depend from claims 1 and 11, and recite 
additional features therefore. As such, and for at least the same reasons set forth 
above, the Applicants submit that claims 7-10 and 17-20 are not anticipated by the 
teachings of Nitta. Therefore, the Applicants submit that dependent claims 7-10 and 17- 
20 also fully satisfy the requirements of 35 U.S.C. §102 and are patentable thereunder. 
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II. ALLOWABLE SUBJECT MATTER 

The Applicants thank the Examiner for his comments regarding the allowance of 
claims 22-24. Additionally, the Applicants thank the Examiner for his comments 
regarding the allowability of claims 2-6 and 12*16, if rewritten into independent form 
including all of the limitations of the base claim and any intervening claims. However, in 
light of the above arguments, the Applicants respectfully submit that claims 1 and 11, 
from which claims 2-6 and 12-16 respectively depend, are currently in allowable form, 
and, as such, claims 2-6 and 12-16 are in allowable form as they stand. 

III. CONCLUSION 

Thus, the Applicants submit that all of the presented claims now fully satisfy the 
requirements of 35 U.S.C. §102. Consequently, the Applicants believe that all of these 
claims are presently in condition for allowance. Accordingly, both reconsideration of this 
application and its swift passage to issue are earnestly solicited. 

If, however, the Examiner believes that there are any unresolved issues requiring 
the issuance of a final action in any of the claims now pending in the application, it is 
requested that the Examiner telephone Mr. Kin-Wah Tona. Esq. at (732) 530-9404 so 
that appropriate arrangements can be made for resolving such issues as expeditiously 
as possible. 



Respectfully submitted, 



Date 





(732) 530- 9404 



Patterson & Sheridan, LLP 
595 Shrewsbury Avenue 
Shrewsbury, New Jersey 07702 
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